Darstellung der zusammenfassenden Statistik
Humboldt-Universität zu Berlin
2023-06-19
Letzte Woche haben wir…
summarise() von dplyr benutzt ✅.by =) Gruppen erstellt ✅This week we will learn how to…
facet_wrap() to plot more than three variablesSection 2.5 (Visualising relationsips) in Wickham et al. (o. J.)
Ch. 4 (Representing summary statistics) in Nordmann et al. (2022)
Abbildung 1: Different plots types to visualise distribution of raw data: histogram (A), density plot (B), scatterplot (C), stacked barplot (D), and dodged barplot (E)
‘Mirrored’ density plot
What does ‘mirrored’ density plot mean? Violin plots are literally just a double-sided density plot. Compare Abbildung 2 to ?@fig-density3. They show the same data and the same distribution, but the violin plot is simply a density plot on both sides, but without the density values printed along the axis.
colour, fill, or shape)colour (all plots) and shape (scatterplot) to visualise species or sex in addition to what was mapped along the x- and y-axesflipper_length_mm (x-axis), body_mass_g (y-axis), species (color), island (shape)facet_wrap()facet_wrap()
facet_wrap() to divide Abbildung 5 into three panels, by island
facet_wrap() take as its argument(s)?
facet_grid()
facet_wrap() is related to facet_grid(), which can take two categorical variables, one in columns and one in rows. The argument for facet_grid() is an equation: row~column. So, if we add facet_grid(sex~island) to our plot, we should see the data in plots grouped by sex in rows (one row for female, one row for male) and island in columns (one column for each island)
df_penguins (body mass by sex)Image source: (winter_statistics_2019?) (all rights reserved)
Or, explained another way:
Image source: Wickham et al. (o. J.) (all rights reserved)
geom_boxplot()geom_boxplot()
geom_boxplot()geom_boxplot() take?
colour or fill aestheticgeom_point()
geom_errorbar()
df_penguins (body mass by sex)body_mass_g by species and sex
mean and sd of body_mass_g by species and sex?| species | sex | mean | sd | N |
|---|---|---|---|---|
| Adelie | female | 3368.836 | 269.3801 | 73 |
| Adelie | male | 4043.493 | 346.8116 | 73 |
| Chinstrap | female | 3527.206 | 285.3339 | 34 |
| Chinstrap | male | 3938.971 | 362.1376 | 34 |
| Gentoo | female | 4679.741 | 281.5783 | 58 |
| Gentoo | male | 5484.836 | 313.1586 | 61 |
ggplot2
knitr and kableExtra!!!!new object, or feeding the summary into ggplot directly with a pipe
# Create new object with summaries
sum_penguins <- df_penguins %>%
summarise(mean = mean(body_mass_g),
sd = sd(body_mass_g),
upper = mean+sd,
lower = mean-sd,
N = n(),
.by = c(species,sex)) %>%
arrange(species, sex)
# Feed new object into ggplot
sum_penguins %>%
ggplot(aes(x = sex, y = mean, colour = species)) geom_point()
geom_errorbar()
aes(ymin = mean-sd, ymax = mean+sd)mean-sd (lower) and mean+sd (upper) for each group for us, so we can use those insteadBarplot of mean: stay away!
I implore you, do not plot means using error bars! You will very often see barplots of mean values, and others might even teach this in other courses, but there are lots of reasons why this is a bad idea!!
Firstly, they can be very misleading. They start at 0 and give the impression that data stop at the mean, when about half the data is (usually) above the mean.
datasauRus package, which contains datasets with similar means, standard deviations, and number of observations
x and y
position = posiiton_dodge(0.3) tells ggplot2 how to position objects
position_dodge() means: move overlapping objects horizontallyposition_dodge() for every geom_ that is supposed to be at the same location, and with the same value; otherwise they won’t be alignedgeom_point(size = 3): adjust the size of the pointsgeom_errorbar(width = .3): adjust the width of the errorbars
position_dodge() and geom_errorbar(width = ), this way the errobars always touch the ‘middle’ line (try changing either value to see what I mean)scale_colour_colorblind(): use a colorblind-friendly colour schemetheme_minimal(): cleans up the plot (we’ve also seen theme_bw(), more about themes here)geomsgeom
geom_point() with the data and aes() neededsum_penguins %>%
ggplot(aes(x = species, y = mean,
colour = sex, shape = sex)) +
geom_point(data = df_penguins,
aes(x = species, y = body_mass_g)) +
geom_point(position = position_dodge(0.3),
size = 3) +
geom_errorbar(aes(ymin=lower,ymax=upper),
position = position_dodge(0.3),
width = .3) +
scale_colour_colorblind() +
theme_minimal()position_dodge()
sum_penguins %>%
ggplot(aes(x = species, y = mean,
colour = sex, shape = sex)) +
geom_point(data = df_penguins,
aes(x = species, y = body_mass_g),
position = position_dodge(0.3)) +
geom_point(position = position_dodge(0.3),
size = 3) +
geom_errorbar(aes(ymin=lower,ymax=upper),
position = position_dodge(0.3),
width = .3) +
scale_colour_colorblind() +
theme_minimal()alpha valuesum_penguins %>%
ggplot(aes(x = species, y = mean,
colour = sex, shape = sex)) +
geom_point(data = df_penguins,
aes(x = species, y = body_mass_g),
position = position_dodge(0.3),
alpha = .4) +
geom_point(position = position_dodge(0.3),
size = 3) +
geom_errorbar(aes(ymin=lower,ymax=upper),
position = position_dodge(0.3),
width = .3) +
scale_colour_colorblind() +
theme_minimal()position_jitterdodge() moves objects to not overlap
dodge.width = .3 to match position_dodge() of errorbarsjitter.width = to say how much we want the points to jittergeom_errorbar(size = 1) makes the errorbar lines thickersum_penguins %>%
ggplot(aes(x = species, y = mean,
colour = sex, shape = sex)) +
geom_point(data = df_penguins,
aes(y = body_mass_g),
position = position_jitterdodge(dodge.width = .3,
jitter.width = 0.3),
alpha = .4) +
geom_point(position = position_dodge(width =0.3),
size = 3) +
geom_errorbar(aes(ymin=lower,ymax=upper),
position = position_dodge(0.3),
width = .3,
size = 1) +
scale_colour_colorblind() +
theme_minimal()Heute haben wir gelernt, wie man…
facet_wrap() to plot more than three variables ✅df_penguins data, with:
sex plotted on the x axis and with colour or fill (choose one)flipper_length_mm plotted along the y axisisland plotted in three panels using facet_wrap()
theme_ setting you choose (e.g., theme_bw(); for more options see here)label to the figure (fig-...) and a caption (fig-cap:). Briefly describe the plot, using a cross-reference (@fig-... shows that…).geom_ and some labels to Abbildung 12.Abbildung 17: A multi-layered plot
patchwork package (see week 3 notes), plot your boxplot and your errorbar/violin plots side by side. It should look something like Abbildung 18.
+ plot_annotation(tag_level = "A") from patchwork
Abbildung 18: Combined plots with patchwork
Hergestellt mit R version 4.3.0 (2023-04-21) (Already Tomorrow) und RStudioversion 2023.3.0.386 (Cherry Blossom).
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magick_2.7.4 patchwork_1.1.2 ggthemes_4.2.4
[4] palmerpenguins_0.1.1 here_1.0.1 lubridate_1.9.2
[7] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[10] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[13] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.3 generics_0.1.3 xml2_1.3.4
[4] lattice_0.21-8 stringi_1.7.12 hms_1.1.3
[7] digest_0.6.31 magrittr_2.0.3 evaluate_0.21
[10] grid_4.3.0 timechange_0.2.0 fastmap_1.1.1
[13] Matrix_1.5-4 rprojroot_2.0.3 jsonlite_1.8.5
[16] httr_1.4.6 rvest_1.0.3 mgcv_1.8-42
[19] fansi_1.0.4 viridisLite_0.4.2 scales_1.2.1
[22] cli_3.6.1 rlang_1.1.1 splines_4.3.0
[25] munsell_0.5.0 withr_2.5.0 yaml_2.3.7
[28] tools_4.3.0 tzdb_0.4.0 colorspace_2.1-0
[31] webshot_0.5.4 pacman_0.5.1 kableExtra_1.3.4.9000
[34] png_0.1-8 vctrs_0.6.2 R6_2.5.1
[37] lifecycle_1.0.3 pkgconfig_2.0.3 pillar_1.9.0
[40] gtable_0.3.3 glue_1.6.2 Rcpp_1.0.10
[43] systemfonts_1.0.4 highr_0.10 xfun_0.39
[46] tidyselect_1.2.0 rstudioapi_0.14 knitr_1.43
[49] farver_2.1.1 datasauRus_0.1.6 nlme_3.1-162
[52] htmltools_0.5.5 svglite_2.1.1 rmarkdown_2.22
[55] labeling_0.4.2 compiler_4.3.0
Woche 10 - Datenvisualisierung 2